St Reading Cross-genre Feature Comparisons for Spoken Sentence Segmentation 5

نویسندگان

  • SEBASTIEN CUENDET
  • DILEK HAKKANI-TUR
  • JAMES FUNG
  • BENOIT FAVRE
  • ELIZABETH SHRIBERG
چکیده

Automatic sentence segmentation of spoken language is an important precursor to downstream natural language processing. Previous studies combine lexical and prosodic fea19 tures, but can impose significant computational challenges because of the large size of feature sets. Little is understood about which features most benefit performance, partic21 ularly for speech data from different speaking styles. We compare sentence segmentation for speech from broadcast news versus natural multi-party meetings, using identical 23 lexical and prosodic feature sets across genres. Results based on boosting and forward selection for this task show that (1) features sets can be reduced with little or no loss in 25 performance, and (2) the contribution of different feature types differs significantly by genre. We conclude that more efficient approaches to sentence segmentation and similar 27 tasks can be achieved, especially if genre differences are taken into account.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactic Feature of EFL Speakers’ Conference Presentations: The Case of Passive Voice and Pseudo-Cleft

Acquiring proficiency in academic genres is a key factor in research community. Among various genres in academic discourse communities, spoken genre, especially Conference Presentations (CPs), play a crucial role in research communities, though investigation on this important genre is in its infancy or is relatively under-researched. Therefore, the present study aims to shed light on the import...

متن کامل

Automatic Segmentation for Emotional Feature Extraction from Spoken Sentence

Perception of speaker’s emotion is one of interesting issues in human-robot interaction. Especially, friendly and instinctive interface between robots and humans is required for making service robots useful to inexpert interacting with robots. Among several mode in communications, speech is easiest method for human because speech is fundamental communication tool in human-human interaction. How...

متن کامل

An Open Source Prosodic Feature Extraction Tool

There has been an increasing interest in utilizing a wide variety of knowledge sources in order to perform automatic tagging of speech events, such as sentence boundaries and dialogue acts. In addition to the word spoken, the prosodic content of the speech has been proved quite valuable in a variety of spoken language processing tasks such as sentence segmentation and tagging, disfluency detect...

متن کامل

Orthographic variations and visual information processing.

Based upon an analysis of how graphemic symbols are mapped onto spoken languages, three distinctive writing systems with three different relations between script and speech relationships are identified. They are logography, syllabary, and alphabet, developed sequentially in the history of mankind. It is noted that this trend of development seems to coincide with the trend of cognitive developme...

متن کامل

Dependency Parsing of Japanese Spoken Monologue Based on Clause Boundaries

Spoken monologues feature greater sentence length and structural complexity than do spoken dialogues. To achieve high parsing performance for spoken monologues, it could prove effective to simplify the structure by dividing a sentence into suitable language units. This paper proposes a method for dependency parsing of Japanese monologues based on sentence segmentation. In this method, the depen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007